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The Quantum Reverse Shannon Theorem 

Charles H. Bennett, Igor Devetak, Aram W. Harrow, Peter W. Shor and Andreas Winter 



Abstract — We show how to use entanglement and noiseless 
quantum or classical communication to simulate discrete mem- 
oryless quantum channels with unit fidelity and efficiency in 
the limit of large block size. When the sender and receiver 
share enough standard ebits and are promised that the input 
to the channels is a memoryless (or i.i.d.) quantum source, our 
simulation uses an asymptotic rate of communication equal to 
the entanglement-assisted capacity of the channel. This commu- 
nication rate also suffices for general (non-i.i.d.) sources if the 
ebits are replaced by a stronger entanglement resource, so-called 
entanglement-embezzling states, or if in addition to a supply 
of ebits, free backwards conununication is allowed. Combined 
with previous coding theorems for entanglement-assisted classical 
communication over quantum channels, our results establish 
the ability of any channels to simulate any other, with an 
asymptotic efficiency given by the ratio of their entanglement- 
assisted capacities. Our result can be used to prove a strong 
converse to the coding theorem for entanglement-assisted classical 
communication. 

We also give a regularized expression for the optimal 
communication-entanglement tradeoff for quantum channel sim- 
ulation when a limited rate of ebits is available. We compare 
these quantum results with the analogous classical reverse 
Shannon theorem, and the analogous tradeoff between forward 
communication and shared random bits required to simulate a 
classical channel. We consider these tradeoffs both for ordinary 
simulations and for "feedback" simulations, where in the classical 
case the sender receives a copy of the channel's output and in 
the quantum case the sender receives the channel's environment. 
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1. Introduction 

In classical information theory, Shannon's celebrated noisy 
channel coding theorem establishes the abihty of any noisy 
memoryless channel to simulate an ideal noiseless binary 
channel, and shows that its asymptotic efficiency or capacity 
for doing so is given by a simple expression 

C(iV) =max/(X;y)) 

p 

= max{H{X) + H{Y)) - H{X,Y)], (1) 

p 

where H is the entropy, X the input random variable and 
Y = N{X) the induced output variable. The capacity, in 
other words, is equal to the maximum, over input distributions, 
of the input: output mutual information for a single use of 
the channel. Recently, a dual theorem, the classical "reverse 
Shannon theorem" was proved [9], [10], which states that for 
any channel N of capacity C, if the sender and receiver share 
an unlimited supply of random bits, an expected Cn + o{n) 
uses of a noiseless binary channel are sufficient to exactly 
simulate n uses of the channel. In [51] a version of this 
construction is given which achieves asymptotically perfect 
simulation and works on a uniform blocksize Cn + o{n). 
Together, these theorems show that in the presence of shared 
randomness, the asymptotic properties of a classical channel 
can be characterized by single parameter, its capacity; with 
all channels of equal capacity being able to simulate one 
another with unit asymptotic efficiency in the presence of 
shared randomness. In [10] a quantum analog of the reverse 
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Shannon theorem was conjectured, according to which quan- 
tum channels should be characterizable by a single parameter 
in the presence of shared entanglement between sender and 
receiver. 

A (discrete memoryless) quantum channel can be viewed 
physically as a process wherein a quantum system, originating 
with a sender Alice, is split into a component for the receiver 
Bob and the inaccessible envirormient Eve. Mathematically it 
can be viewed as an isometric embedding J^f^^^^ of Alice's 
Hilbert space [A) into the joint Hilbert space of Bob {B) 
and Eve {E). Tracing out Eve yields a completely positive, 
trace-preserving linear map on density operators from A to 
B, which we denote Af^^^ . Operationally, the two pictures 
are equivalent, but we will sometimes find it convenient 
mathematically to work with one or the other. The theory 
of quantum channels is richer and less well understood than 
that of classical channels. UnUke classical channels, quantum 
channels have multiple inequivalent capacities, depending on 
what one is trying to use them for, and what additional 
resources are brought into play. These include 

• The ordinary classical capacity C, defined as the max- 
imum asymptotic rate at which classical bits can be 
transmitted reliably through the channel, with the help 
of a quantum encoder and decoder 

• The ordinary quantum capacity Q, which is the maximum 
asymptotic rate at which qubits can be transmitted under 
similar circumstances. 

• The private classical capacity P, which is the maximum 
rate at which classical bits can be transmitted to Bob 
while remaining private from Eve, who is assumed to 
hold the channel's environment E. 

• The classically assisted quantum capacity Q2, which is 
the maximum asymptotic rate of reliable qubit transmis- 
sion with the help of unUmited use of a 2-way classical 
side channel between sender and receiver. 

• The entanglement assisted classical capacity Ce [9], 
[10], which is the maximum asymptotic rate of reliable 
bit transmission with the help of unlimited pure state 
entanglement between the sender and receiver. 

Somewhat unexpectedly, the last of these has turned out 

to be the simplest to calculate, being given by an expression 
analogous to Eq. (1). In [10] (see also [36]) it was shown that 

Ce{M) = ma,x{H{p) + H{M{p)) - H{I ® M){^ p))) , (2) 
p 

where p runs over all density matrices on A and <^^^ is 
a purification of p by a reference system R (meaning that 
^^"^ is a pure state and Txr ^^"^ = p^). The entanglement- 
assisted capacity formula Eq. (2) is formally identical to 
Eq. (1), but with Shannon entropies replaced by von Neumann 
entropies. We can alternately write the RHS of Eq. (2) as 
maxp/(i?;B)p, where \^) = (/^ ® TV^^-^^)]*^^) and 
I{R;B)p = I{R;B)^ = H{R)^ + H{B)^ - H{RB)^ = 
H{^^) + H{^^) - H{-^^^). We will use I{R;B)p and 
I{R;B)<s, interchangeably, since the mutual information and 
other entropic properties of \E' are uniquely determined by p. 

Aside from the constraints Q < P < C < Ce, and 
Q < Q2, which are obvious consequences of the definitions. 



and Q2 < Qb = ^(^E, which can be proved using the results 
of [20] and [14], the five capacities appear to vary rather 
independently (see, for example, [6]). Except in special cases, 
it is not possible, without knowing the parameters of a channel, 
to infer any one of these capacities from the other four. 

This complex situation naturally raises the question of how 
many independent parameters are needed to characterize the 
important asymptotic, capacity-like properties of a general 
quantum channel. A full understanding of quantum channels 
would enable us to calculate not only their capacities, but more 
generally, for any two channels A4 and Af, the asymptotic 
efficiency (possibly zero) with which M can simulate Af, 
both alone and in the presence of auxiliary resources such 
as classical communication or shared entanglement. 

One motivation for studying communication in the presence 
of auxiliary resources is that it can simplify the classification 
of channels' capacities to simulate one another. This is so 
because if a simulation is possible without the auxiliary 
resource, then the simulation remains possible with it, though 
not necessarily vice versa. For example, Q and C represent a 
channel's asymptotic efficiencies of simulating, respectively, a 
noiseless qubit channel and a noiseless classical bit channel. 
In the absence of auxiliary resources these two capacities 
can vary independently, subject to the constraint Q < C, 
but in the presence of unlimited prior entanglement, the 
relation between them becomes fixed: Ce = "^Qe, because 
entanglement allows a noiseless 2-bit classical channel to 
simulate a noiseless 1 -qubit channel and vice versa (via 
teleportation [5] and superdense coding [11]). Similarly the 
auxiliary resource of shared randomness simplifies the theory 
of classical channels by allowing channels to simulate one 
another efficiently according to the classical reverse Shannon 
theorem. 

The various capacities of a quantum channel J\f may be 
defined within a framework where asymptotic communica- 
tion resources and conversions between them are treated 
abstractly [18]. A noisy channel J\f corresponds to an asymp- 
totic resource (Af), while standard resources such as ebits 
(maximally-entangled pairs of qubits, also known as EPR 
pairs), or instances of of a noiseless qubit channel from Alice 
to Bob are denoted [qq] and [q q] respectively. Their 
classical analogues are [cc] and [c — > c], which stand for bits 
of shared randomness (rbits), and uses of noiseless classical 
bit channels (cbits). Communication from Bob to Alice is 
denoted by [q <— q] and [c <— c]. Within this framework, 
coding theorems can be thought of as transformations from one 
communication resource to another, analogous to reductions in 
complexity theory, but involving resources that are quantitative 
rather than qualitative, the quantity (if other than 1) being 
indicated by a coefficient preceding the resource expression. 
We consider two kinds of asymptotic resource reducibility 
(called resource inequalities in [18]): viz. reducibility via local 
operations <lo > clean local operations "^clo ■ ^ 

resource /? is said to be LO-reducible to a if there is an 
asymptotically faithful transformation from a io (3 via local 
operations: that is, for any e, (5 > and for all sufficiently large 
n, n{l + 6) copies of a can be transformed into n copies of 
/? with overall diamond-norm ([40] and see also [46]) error 
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< e. The clean version of this reducibility, ^clo > which is 
important when we wish to coherently superpose protocols, 
adds the restriction that any quantum subsystem discarded 
during the transformation be in the zero state up to an error 
that vanishes in the hmit of large n. 

For example, the coding theorem for entanglement-assisted 
classical conamunication can be stated as 

{Af) + oo[qq] Ce{M) [c^c]. (3) 

where Ce{.N') is defined as in Eq. (2). Sometimes, where 
there is no danger of confusion, <.]^q ,^lo =1-0 may 
be written <, > and =, as in [18J. 

In this language, to simulate (resp. cleanly simulate) a 
channel N is to find standard resources a (made up of qubits, 
ebits, cbits and so on) such that {N)<i^q a (resp. "^clo )• 
example, the simplest form of the classical reverse Shannon 
theorem can be stated as ^ n{N)<j^q C{N)[c c\ + 00 [cc], 
with C{N) defined in Eq. (1). 

We will also introduce notation for two refinements of the 
problem. First, we (still following [18]) define the relative 
resource {Af : p) as many uses of the channel Af whose 
accuracy is guaranteed only on average when n uses of Af 
are fed an input of the form p®". Most coding theorems 
still apply to relative resources, once we drop the maxi- 
mization over input distributions. So for a classical channel 
{N : p) y^i^o I{^\Y)p[c — > c] and for a quantum channel 
{Af : p) + oo[qq] >^„ I{R;B)p[c c], with the last 
term representing the quantum mutual information between 
the channel output B and the reference system R purifying 
the channel input, when the input was distributed according 
to density matrix p, as in Eq. (2). 

Second, we will consider simulating feedback channels. The 
classical version of a feedback channel has Alice obtain a copy 
of Bob's output Y = N{X). We denote this form of a channel 
by Np if the original channel is N. For a quantum channel, 
we cannot give Alice a copy of Bob's output because of the 
no-cloning theorem, but instead define a quantum feedback 
channel as an isometry in which the part of the output that 
doesn't go to Bob is retained by Alice, rather than escaping to 
the environment. We denote this Afp^^^ , where the subscript 
F indicates that E is retained by AUce. Quantum feedback 
is an example of quantum state redistribution [37] in which 
the same global pure state is redistributed among a set of 
parties. The redistribution corresponding to a feedback channel 
j^A^BE involves two active parties and a passive purifying 
reference system R. The first party's share A of the initial state 
i^A::R^ is Split iuto two parts, E and B, with E is remaining 
with the first party, while B passes to a second party who 
initially held nothing, leading to a final state \^^-B-R 

Classical and quantum feedback are rather different notions. 
Indeed one might say opposite notions, since in quantum 
feedback Alice gets to keep everything but what Bob receives, 
and as a result quantum feedback is strictly stronger than 
classical feedback as a resource. Despite this, there are close 
parallels in how feedback affects the tradeoff between static 
resources (rbits, ebits) and dynamic resources (cbits, qubits) 
required for channel simulation. In both cases, when the static 
resource is restricted, simulating a non-feedback version of the 



channel requires less of the dynamic resource than simulating a 
feedback version, because the non-feedback simulation can be 
economically spUt into two sequential stages. For a feedback 
simulation, no such splitting is possible. 

In this paper we consider what resources are required to 
simulate a quantum channel. In particular, one might hope 
to show, by analogy with the classical reverse Shannon theo- 
rem, that QsiAf) qubits of forward classical communication, 
together with a supply of shared ebits, suffice to efficiently 
simulate any quantum channel Af on any input. This turns out 
not to be true in general (see below), but it is true in some 
important special cases: 

• When the input is of tensor power form p®", for some 
p. In this case, we are simulating the relative resource 

(AA:p). 

• When the channel Af has the property that its output 

entropy S{Af{p)) is uniquely determined by the state 
of the environment. Such channels include those with 
classical inputs or outputs. 

However, for general channels on general (i.e. non-tensor- 
power) inputs, we show that efficient simulation requires 
additional resources beyond ordinary entanglement. Any of 
the following resources will suffice: 

• more general forms of entanglement, such as an 
entanglement-embezzling state, in place of the supply of 
ordinary ebits, or 

• additional communication from Alice to Bob, or 

• backward classical or quantum communication, from Bob 
to Alice. 

The quantum reverse Shannon theorem is thus more fastidious 
than its classical counterpart. While classical shared random 
bits (rbits) suffice to make all classical channels equivalent 
and cross-simulable, standard ebits cannot do so for quantum 
channels. The reason is that quantum channels may require 
different numbers of ebits to simulate on different inputs. 
Therefore, to maintain coherence of the simulation across a 
superposition of inputs, the simulation protocol must avoid 
leaking to the environment these differences in numbers 
of ebits used. Fortunately, if the input is of tensor power 
form p®", the entanglement "spread" required is rather small 
iO{^/n)), so it can be obtained at negligible additional cost 
by having Alice initially share with Bob a slightly generous 
number of ebits, then at the end of the protocol return the 
unused portion for him to destroy. On non-tensor-power inputs 
the spread may be 0{n), so other approaches are needed if 
one is to avoid bloating the forward communication cost. If, 
as in the first special case above, the channel itself already 
leaks complete information about the output entropy to the 
environment, there is nothing more for the simulation to 
leak, so the problem becomes moot. Otherwise, there are 
several ways of coping with a large entanglement spread 
without excessive forward communication, including: 1) using 
a more powerful entanglement resource in place of standard 
ebits, namely a so-called entanglement-embezzling state, from 
which a variable amount of entanglement can be "borrowed" 
without leaving evidence of how much was taken, or 2) using 
a generous supply of standard ebits but supplementing the 
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protocol by additional backward classical communication to 
coherently "bum off" the unused ebits. We discuss the role of 
entanglement spread in the quantum reverse Shannon theorem 
in Sec. II-C. 

Because the state redistribution performed by a quantum 
feedback channels is asymptotically reversible, many of our 
results concerning feedback channels can be expressed as 
resource equivalences rather than reducibilities, for example 

{^F : P) ^LO IhR; B) [q^q] + ]^I{E- B) [qq] , (4) 

indicating the numbers of qubits and ebits asymptotically 
necessary and sufficient to perform the redistribution ^li^''^ 
^E:B:R tensor powers of a source with density matrix p^, 
and the fact that any combination of resources asymptotically 
able to perform the feedback simulation of M on p can be 
converted into the indicated quantities of qubits and ebits. 

While the classical communication cost of simulating a 
classical channel in the low- or no-shared randomness regime 
is additive and given by a single-letter formula [54], [16], it 
is not known whether the same holds in the quantum low- 
entanglement regime. Another classical/quantum difference 
is that some classical channels require much less classical 
communication to simulate using shared entanglement than 
shared randomness [53]. 

II. Statement of results 

Figure 1 shows the parties, and corresponding random 
variables or quantum subsystems, involved in the operation 
of a discrete memoryless classical channel (top left) and a 
discrete memoryless quantum channel (top right). Dashed 
arrows indicate additional data flows characterizing a feed- 
back channel. The bottom of the figure gives flow diagrams 
for simulating such channels using, respectively, a classical 
encoder and decoder (bottom left) or a quantum encoder 
and decoder (bottom right). Shared random bits (rbits) and 
forward classical communication (ebits) are used to simulate 
the classical channel; shared entanglement (ebits) and forward 
quantum communication (qubits) are used to simulate the 
quantum channel. As usual in Shannon theory, the encoder and 
decoder typically must operate in parallel on multiple inputs 
in order to simulate multiple channel uses with high fidelity. 

Where it is clear from context we will often used upper case 
letters X, B, etc. to denote not only a classical random variable 
(or quantum subsystem) but also its marginal probability distri- 
bution (or density matrix) at the relevant stage of a protocol, 
for example writing H{B) instead of H{p^). Similarly we 
write I{E\ B) for the quantum mutual information between 
outputs E and B in the upper right side of Figure 1 . However it 
is not meaningful to write I{A; B), because subsystems A and 
B don't exist at the same time. Thus the conventional classical 
notation I{X; Y) for the input:output mutual information may 
be considered to refer, in the quantum way of thinking, to the 
mutual information between Y and a copy of X, which could 
always have been made in the classical setting. 

Figure 2 shows some of the known results on communi- 
cations resources required to simulate classical and quantum 
channels under various conditions. . 




Fig. 1. Parties and subsy.stems associated with classical and quantum 
channels (top left and right, resp.) and with their simulation using standard 
resources (bottom left and right respectively). 
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Fig. 2. Communication costs of simulating classical and quantum channels: 
Some known results on the forward communication cost (c=cbits or (ir=qubits) 
for simulating classical and quantum channels are tabulated as a function 
of the kind of source (tensor power or arbitrary), the kind of simulation 
(feedback or non-feedback), and the quantity of shared random bits (r) or 
ebits (e) available to assist simulation. For non tensor power quantum sources 
(green shaded cells), faithful entanglement-assisted simulation is not possible 
in general using ordinary ebits, because of the problem of entanglement 
spread. To obtain an efficient simulation in such cases requires additional 
communication (wlog backward classical communication), or a stronger form 
of entanglement resource than ordinary ebits, such as an entanglement- 
embezzling state. 

A. Classical Reverse Shannon Theorem 

Most of these results are not new; we collect them here 
for completeness, and give alternate proofs that will help 
prepare for the analogous quantum results. The high-shared- 
randomness and feedback cases below (a,b,e) were proved in 
[9], [10], [51]. The low- and zero-shared-randomness cases 
(c,d,f) were demonstrated by Cuff [16] building on Wyner's 
classic common randomness formula [54] for the communi- 
cation cost of generating correlated random variables. 

Theorem 1 (Classical Reverse Shannon Theorem (CRST)): 
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Let iV be a discrete memoryless classical channel with input 
distribution X (a random variable) and induced output 
Y = N{X). We will use I{X;Y) to indicate the mutual 
information between input and output. Let Np denote 
the feedback version of N, which gives Alice a copy 
of Bob's output Y ^ N{X). Trivially N<^qNf and 
{N ■.p)<Lo {Nf--p) for all p. 

(a) Feedback simulation on known sources with sufficient 
shared randomness to minimize communication cost: 



{Nf ■p)<lo Y)[c ^ c] + H{Y\X)[cc] 



(5) 



In fact this is tight up to the trivial reduction 
[cc]<io [c ^ c]. In other words, for c and r nonnegative, 



{Nf ■■p)<lo c[c ^ c] + r[cc] 



(6) 



iff c> I{X;Y) and c+r > H{Y). 

(b) Feedback Simulation on general sources with sufficient 
shared randomness to minimize communication cost: 

{Nf)<loC{N)[c^c]+{uis.^H{Y)-C{N))[cc]. (7) 

p 

(c) Non-feedback simulation on known sources, with limited 
shared randomness: When shared randomness is present 
in abundance, feedback simulation requires no more 
communication than ordinary non-feedback simulation, 
but when only limited shared randomness is available, 
the communication cost of non-feedback simulation can 
be less. This follows from the possibility of splitting the 
simulation into two stages with the second performed 
by Bob, and part of the first stage's randomness being 
recycled or derandomized. Since Alice does not get to 
see the output of the second stage, this is a non-feedback 
simulation. 



{N ■.X)<^oc[c^c]+t[cc] 



(8) 



if and only if there exists a random variable W with 
I{X- Y)\W) = 0, such that c > I{X- W) wd c + r > 
I{X,Y-W). 

(d) Non-feedback simulation on known sources with no 
shared randomness: A special case of (c) is the fact that 



{N ■.p)<Loc[c-^c] 



(9) 



if and only if there exists W such that I{X;Y\W) = Q 
wd c>I{X,Y;W). 
(e) Feedback simulation on arbitrary sources, with arbitrary 
shared randomness: For non-negative r and c. 



{NF)<Loc[c'^c]+r[cc] 



(10) 



iff c > C{N) = maxp I{X- Y) and r > maxp H{Y) - 
maXp/(X;y). Because the two maxima may be 
achieved for different X the last condition is not simply 

r > H{Y\X). 

(f) Without feedback we have, for non-negative r and c. 



{N)<^o4c^c]+r[cc] 



(11) 



if and only if for all X there exists W with I{X;Y\W) 
0, such that c > I{X- M^) and c + r > I{X, Y; W). 



Parts (b,e,f) of the theorem reflect the fact that the cost of a 
channel simulation depends only on the empirical distribution 
or type class of the input, which can be communicated in 
at asymptotically negligible cost (O(logn) bits), and that 
an i.i.d. source p is very likely to output a type p' with 
\\p — p'\\i ~ l/y/n. Also note that in general the resource 
reducibility Eq. (10) is not a resource equivalence because 
H{Y) and I{X;Y) may achieve their maxima on different 
X. 

As the following Fig. 3 shows, for feedback simulation 
on a fixed source, the tradeoff between communication and 
shared randomness is trivial: Beginning at the point c = 
I{X; Y),r = H{Y\X) on the right, r can only be decreased 
by the same amount as c is increased, so that c = H{Y) 
when r = 0. By contrast, if the simulation is not required 
to provide feedback to the sender, a generally nontrivial 
tradeoff results, for which the amount of communication at 
r = is given by Wyner's common information expression 
Ta:in{I{XY-W) : W s.t. I{X-Y\W) = 0}. 



Cf(0) = «(r)- 



c(0) = 

min I(XY;W) 

Ws.t. 



1{X:Y) 



Simulation witi^ Feedbacl< — ■ 
Simuiation w/o Feedbacl< — 



c(r) = min minL{I{XY:W)-r, I(X:W)} 

Ws.t. 



H{Y\X) 



Fig. 3. Classical communication(c) and shared randomness (r) tradeoff for 
feedback and non-feedback simulations of a classical channel on a specified 
source X (Theorem 1). 

The converse to (a) follows from Shannon's original noisy 
channel coding theorem, which states that {N : p) > 
I{X;Y)p[c c]. A slight refinement [2], [3] implies that 
{Nf : p) > I{X- Y)p[c ^ c] + H{Y\X)j,[cc]. 

Thus we have the following resource equivalences. 

Corollary 2: 

{Nf-.p) I{X-Y)[c^ c] + H{Y\X)[cc] (12) 

{Nf ■ p) + (X)[cc]—LO {N : p) + oo[cc] 

=Lo/(^;n[c^c] + oo[cc] (13) 

{Nf) + cx)[cc]=LO {N) + oo [cc] 

=LO (max/(X;y))[c-> c] +oo[cc] (14) 
p 

Remark: The task considered in case (d) above, of simulat- 
ing a channel on a known source by forward communication 
alone without shared randomness, is a variant of the problem 
originally originally considered by Wyner [54], who sought 
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the minimum rate of a source allowing two correlated random 
variables X and Y to be generated from it by a separate 
decoders. He called this the common information between 
X and Y, and showed it was given by min{/(X, Y\ W) : 
I{X-Y\W) = 0}. 
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Fig. 4. Two-stage non-feedback simulation of a classical channel, via a 
Markov chain X ^ W ^ Y allows a nontrivial tradeoff between forward 
communication c and shared randomness r. A typical point on the optimal 
tradeoff curve is shown. The second term, H{W\X, Y), in the expression for 
r represents the portion of the shared randomness in the first stage simulation 
of X — » W that can be recycled or derandomized. 



B. Quantum Reverse Shannon Theorem ( QRST) 

Theorem 3 (Quantum Reverse Shannon Theorem): Let Af 
be a quantum channel from A BE and J\fp the feedback 
channel that results from giving system E to Alice. If we are 
given an input density matrix p"^ then entropy quantities such 
as I{R; B)p refer to the state ^--^^^ = (/^®7V^^-^^)($^^), 
where is any state satisfying <i>p = p. 

(a) Feedback simulation on known tensor power input, with 
sufficient ebits of entanglement to minimize the forward 
qubit communication cost: 

Vp {Nf ■■ P) ^LO \m B)p[q -> g] + \I{E- B)p[qq] 

(15) 

(b) Known tensor power input, non-feedback simulation, en- 
tanglement possibly insufficient to minimize the forward 
communication cost: 



(A/" : p) <LO 4<i q] + 4ll]- 



(16) 



if and only if there exists an integer n > 1 and an 
isometry V : E^ EaEb such that 



1 



\I{K'-EbB^) 



q > 2 

e > -H{B''Eb) 



(17) 
(18) 



(c) Known tensor power input, non-feedback, no entangle- 
ment: This is obtained from setting e = in case (b) 



above. In this case, Eq. (17) is always dominated by 
Eq. (18) and we have that 



{■N' ■■ p) <Lo q], 



(19) 



if any only if g > lim„^oo ^ miny 
where the minimum is over isometries V : i?" — > 
EaEb- The latter is a well-known quantity: it is 
the regularized entanglement of purification (EoP) [48] 
Ef{^^B) = lim„^oo ^Ep{{^^^)®'') of the channel 
Choi-Jamiolkowski state ^. 
(d) Arbitrary input, feedback simulation: For a communi- 
cation resource a in the sense of [18] comprising any 
combination of ebits, embezzling states [€€], backward 
cbits [c ^ c], and/or forward or backward qubits. 



iff there exists a resource j3 such that for all p, 

a >CLO (J^F ■■p)+P 
Specifically, using embezzling states we have 

(A/»<LO QE{M)[q^q] + [€€] 

and when considering back communication 

{Uf) < QEW[q^q] + C[c^c] 

p-QE{M))[qq] 



+ {max H{B) 
p 



(20) 



(21) 



(22) 



(23) 



iff C > max(0,maXpH(B)p - minp iJ(B|£')p - 
Ce{J^))- Other examples are discussed in Sec. II-C. 
(e) Arbitrary input, no feedback: Let M^^^ denote the 
complementary channel to M, and take ^ to be the set- 
valued inverse of M: that is, = {p '■ A/'(p) — lo}. 
Since Eve leams u ~ JV{p), our channel simulation does 
not need to preserve coherence between inputs in different 
sets Af~^{uj). Otherwise the answer is essentially the 
same as in (d). Again we assume that a is a resource 
comprising some combination of ebits, embezzling states 
[€€], backward cbits [c ^ c], and/or forward or back- 
ward qubits. Now, a>]^Q (Af) iff there exists an integer 
n > 1 such that for every oj £ range (A/''*"), there exists 
a resource /J^^ and an isometry Vuj : E" EaEb such 
that for all p G (AA«^")-i(w), 

1 , . ^24) 



CLO-{y^ 



Part (a) of Theorem 3 can equivalently be stated as 

{Mf ■■ p) ^LO I{R\ B)p[q ^ qq] + H{B\R)p[qq], (25) 

where \q qq] denotes a co-bit[24], [18], which is equivalent 
to {[q —>(?] + [qq])/2. The formulation in Eq. (25) is parallel 
to the classical version in Eq. (12) if we replace quantum 
feedback with classical feedback, co-bits with cbits and ebits 
with rbits. 

A weaker version of (a) was proven in a long unpublished 
and now obsolete version of the present paper The full 
statement of (a) has since been proved by Devetak [17] using 
his triangle of dualities among protocols in the "family tree" 
- see also [18]; by Horodecki et al. [37] as the inverse of the 
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"mother" protocol, a coherent version of state merging; and 
by Abeyesinghe et al. [1] in the context of a direct derivation 
of the "mother" protocol. We will present another proof of (a) 
in Sec. IV. 

To prove (b), first note that we can without loss of generality 
make everything coherent, so that nothing is discarded to the 
environment, but only to the local environments of A and B, 
which we call and Eg. This means that the problem 
of simulating {M : p) can be reduced to the problem of 
simulating -{V o Mf^ : p**") for an arbitrary isometry 
V : E^ — > EaEb- Part (c) is simply a special case of (b), 
and was proven in the case when A/^ is a CQ channel (that 
is, has classical inputs) by Hayashi [29]. It corresponds to 
the regularized entanglement of purification [48] of Y^). In 
both cases, the additivity problem (i.e. the question of whether 
regularization is necessary) is open. 

Proving, and indeed understanding, parts (d) and (e) will 
require the concept of entanglement spread, which we will 
introduce in Sec. II-C. At first glance, the statements of 
the theorem may appear unsatisfying in that they reduce the 
question of whether {M) < a or {JVf) < a to the question of 
whether certain other clean resource reductions hold. However, 
according to part (a) of Theorem 3, the corresponding clean 
resource reductions involve the standard resources of qubits 
and ebits. As we will explain further in Sec. II-C, this will 
allow us to quickly derive statements such as Eq. (22) and 
Eq. (23). 

The proofs of parts (d) and (e) will be given in Sec. IV. 
To prove them, we restrict attention to the case when a 
is a combination of entanglement-embezzling states and/or 
"standard" resources (qubits, cbits and ebits) with the excep- 
tion of forward classical communiation. We believe that this 
restriction is not necessary, but do not resolve this question in 
the present paper 

Remark: Analogously to the low-shared randomness regime 
in classical channel simulation (Figure 4 and cases (c) and (d) 
of the CRST), simulating a non-feedback channel permits a 
nontrivial tradeoff between ebits and qubits, in contrast to the 
trivial tradeoff for feedback simulation. Unlike the classical 
case, the qubit cost in the zero- and low-entanglement regime 
is not known to be given by a single-letter formula. 

Remark: Interestingly, quantum communication or entan- 
glement can sometimes improve simulations of even classical 
channels. In [53] an example of a classical channel is given 
with d-dimensional inputs which requires ri(log(i) classi- 
cal bits to simulate, but can be simulated quantumly using 
0{d~'^^^) qubits of communication, asymptotically. Curiously, 
the classical reverse Shannon theorem Theorem 1 is only a spe- 
cial case of the quantum reverse Shannon theorem Theorem 3 
when in the unlimited shared entanglement regime; one of the 
open problems of this work is to understand how entanglement 
can be more efficient than shared randomness in creating 
correlated classical probability distributions. More generally, 
what values of c, q, r, e are consistent with the reducibility 
(A/')<iOc[c c] + q[q q] + r[cc] + e[qq]l We know 
how to convert this problem to the equivalent relative resource 
problem with (Af) replaced with {Af : p), but this in turn we 
do not have an answer for 



Wyner-like point 

<l{0)= min {H(fV)/n] 




J"wo-stage non-feedback 
simulation gives 
q = HR";IV)/2n, 
e=I{E,;W)l2n 



Ones 
Feedback 
Simulation 



/ 



Shared Ebits e 



V, I(E:B) 



Fig. 5. Two-stage non-feedback simulation of a quantum channel (solid 
red curve) on a specified input p, via an intermediate state W, makes 
possible a nontrivial tradeoff between forward communication q and shared 
entanglement r. By contrast, for a feedback simulation (right, dashed blue 
curve) only a trivial tradeoff is possible, where any deficit in ebits below 
the ^I{E\ B) needed for optimal simulation must compensated by an equal 
increase in the number of qubits used. 



C. Entanglement spread 

To understand parts (d) and (e) of Theorem 3, we need 
to introduce the idea of entanglement spread. This concept is 
further explored in [26], [34], but we review some of the key 
ideas here. 

If Alice's input is known to be of i.i.d. form p^" then 
we know that the channel simulation can be done using 
\l{R\B)[q ^ q\ + ^^^{B; E)[qq\. To see the complications 
that arise from a general input, it suffices to consider the case 
when Alice's input is of the form (pf" + /9f")/2. We omit 
explicitly describing the reference system, but assume that 
Alice's input is always purified by one and that the fidelity 
of any simulation is with respect to this purification. 

Assume that pf" and pf"" are nearly perfectly distin- 
guishable and that the channel simulation should not break 
the coherence between these two states. Naively, we might 
imagine that Alice could first determine whether she holds pf " 
or pf"" and coherently store this in a register i e {1, 2}. Next 
she could conditionally perform the protocol for i.i.d. inputs 
that uses \l{R\B)p'[q q] + \l{B\E)p^[qq]. To use a 
variable amount of communication, it suffices to be given the 
resource max.; \l{A\ B)p. [q q\, and to send |0) states when 
we have excess channel uses. But unwanted entanglement 
cannot in general be thrown away so easily. Suppose that 
I{B]E)p^ > I{B;E)p^, so that simulating the channel on 
pf " requires a higher rate of entanglement consumption than 
p®". Then it is not possible to start with ^nI{B; E)p-^ (or 
indeed any number) of EPR pairs and perform local operations 
to obtain a superposition of \nI{B]E)p^ EPR pairs and 
\nI{B; E)p^ pairs. 

The general task we need to accomplish is to coherently 
create a superposition of different amounts of entanglement. 
Often it is convenient to think about such superpositions as 
containing a small "label" register that describe how many 
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EPR pairs in the rest of the state. For example, consider the 
state 

m 

IV') = ^ v^N)^N)^|*)®"-|00)®^-"-, (26) 

i=l 

where < Ui < N for each i. Crudely speaking^, we say that 
max,; n, — minj rij is the amount of entanglement spread in 
the state \ip). 

A more precise and general way to define entanglement 
spread for any bipartite state \tjj) is (following [34]) as 
A{iIj^) = SQ{tfj^) - Soo{i>^), where So{p) = logrankp and 
Soo{p) = — log IIpIIoo- (The quantities So and 5oo are also 
known as iJmax and Haim respectively. Alternatively, they can 
be interpreted as Renyi entropies.) Ref. [34] also defined an 
e- smoothed version of entanglement spread by 

A,(p) = min{A(cr) : < a < p,Tva > 1 - e} 

that reflects the communication cost of approximately prepar- 
ing \il>). More precisely, we have 

Theorem 4 (Theorem 8 of [34]): If IV*) can be created from 
EPR pairs using C cbits of communication and error < e = 
(5^/4, then 

C> A5(^^) + log(l-5) (27) 

There are a few different ways of producing entanglement 
spread, which are summarized in [26]. For example, one cbit 
can be used to coherently eliminate one ebit, or to do nothing; 
and since both of these tasks can be run in superposition, 
this can also be used to create entanglement spread. Likewise 
one qubit can coherently either create or disentangle one 
ebit. To put this on a formal footing, we use the clean 

clean 

resource reducibility <clo (caUed < in [26]). A resource 
(3 is said to be "cleanly LO-reducible" to a iff there is an 
asymptotically faithful clean transformation from a to (3 via 
local operations: that is, for any e, 5 > and for all sufficiently 
large n, n{l + 5) copies of a can be transformed by local 
operations into n copies of /3 with overall diamond-norm 
error < e, and moreover, any quantum subsystem discarded 
during the transformation is in a standard |0) state, up to an 
error vanishing in the limit of large n. In particular, entangled 
states cannot be discarded. This restriction on discarding states 
means that clean protocols can be safely run in superposition. 

FinaUy, we can define the clean entanglement capacity of a 
resource a. to be the set £"016311(0!) = {E : Oi>cLO ^[qq]}- By 
time-sharing, we see that £ciean(o) is a convex set. However, 
it will typically be bounded both from above and below, 
reflecting the fact that coherently undoing entanglement is a 
nonlocal task. The clean entanglement capacities of the basic 
resources are 

£'clean([g ^ q\) = £'clean([g ^ ?]) = [-1, 1] 
£clean([c ^ c]) = £clean([c ^ c]) = [-1, 0] (28) 
£clea„([OT]) - {1} 

These resources can be combined in various ways to create 
entanglement spread. For example, to create a superposition 

^This neglects the entanglement in the \u) register. However, in typical 
applications, this will be logarithmic in the total amount of entanglement. 



of 4 and 10 EPR pairs, we might start with 8 ebits, use two 
cbits to create a superposition of 6 and 8 ebits and then use 
two qubits to create the desired superposition of 4 and 10 ebits. 

While this framework gives us a fairly clear understanding 
of the communication resources required to create entangle- 
ment spread, it also shows how unUmited EPR pairs are 
not a good model of unlimited entanglement. Instead, we 
will use the so-called entanglement-embezzling [49] states 
\ipN)^^, which are parameterized by their Schmidt rank N, 
and can be used catalytically to produce or destroy any 
Schmidt rank k state up to an error of k/N in the trace 
norm. See [49] for a definition of \(Pn) and a proof of their 
entanglement-embezzling abilities. We let the resource [€€] 
denote access to an embezzhng state of arbitrary size: formally, 
[€€] = {JN>i{fN) and 

-E'clean([€€]) = (-0O,0o). 

By the above discussion, this is strictly stronger than the 
resource oo[(jg]. 

We now return to parts (d) and (e) of Theorem 3. In (d), 
we need to run the simulation protocol for {Afp '■ p) for all 
possible p in superposition.^ We can discard a resource /? at 
the end of the protocol, but (3 must be either independent 
of p for a feedback simulation or can depend only on N{p) 
for a non-feedback simulation. By the equality in Eq. (15), 
this reduces to producing coherent superpositions of varying 
amounts of qubits and ebits. 

The simplest case is when a = Q{M)[q ^ + [€€]. In 
this case, a>cLO Q{-^)[<1 <i\ + E[qq] + [€€] for any E. 
Thus we can take /3 = [€€] and we have so ct>cLO i-^ '■ 
p) + (3 for all p. This establishes Eq. (22). 

The most general case without embezzling states is when 

a = Q^[q^q] + Q2[q ^ g] + C2 [c ^ c] + E[qq]. (29) 
In this case, we always have the constraint 

Qi > Q{N) = max B)p, (30) 

since Qi[q q] is the only source of forward communication. 
Suppose that P = {E — e)[qq], for some < e < 
Now, for each p we require that a>(jj^Q^I{R\B)p\q — > 
q] + {\I{E- B)p + E- e)[qq]. Equivalently 

(gi - \I{R; B)p)[q ^q]+Q^[q^q] + C^V ^ c] 

>CLo{\m.B)p-e)\qq\. (31) 

We can calculate when Eq. (31) holds by using the spread 
capacity expressions in Eq. (28). First, if \l{E\ B)p — e>0 
then the C2[c ^ c] is not helpful and we simply have Qi — 
\l{R; B)p + Q2> \l{E; B)p - e, or equivalently 

Qi + Q2>H{B)p-e. 

Alternatively, if \l{E] B) p—e < then we have the inequality 
Qi - \l{R;B)p + Q2 + C2 > e - \l{E-B)p, which is 
equivalent to 

Qi + Q2 + C2>e- H{B\R)p. 

^For technical reasons, our coding theorem will adopt a slightly different 
approach. But for the converse and for the present discussion, we can consider 
general inputs to be mixtures of tensor power states. 
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We will consider the case when E is sufficiently large so that 
it does not impose any constraints on the other parameters. 
This results in the bound 

2(0i + Q2) + C2> ma.xH{B)p - minH{B\R)p - Ce{N). 

(32) 

It is important to note that the maximization of H{B) and 
the minimization of H{B\R) on the RHS of Eq. (32) are 
taken separately. Indeed, CEiM) is simply the maximization 
of H{B)p - H{B\R)p over all p, so the RHS of Eq. (32) 
expresses how much larger this expression can be by breaking 
up the maximization of those two terms. 

Fortunately, each term in Eq. (32) is additive, so there is no 
need to take the limit over many channel uses. The additivity 
of H{B)p follows immediately from the subadditivity of the 
von Neumann entropy, or equivalently the nonnegativity of 
the quantum mutual information. The other two terms have 
already been proven to be additive in previous work: [19J 
showed that m.m.H{B\R) = msx. H{B\E) is additive and 
[10] showed that Ce is additive. Thus, we again obtain a 
single-letter formula in the case of unlimited EPR pairs. 

The non-feedback case (e) of Theorem 3 adds one additional 
subtlety: since the simulation gives part of the input to Eve, it 
does not have to preserve superpositions between as many 
different input density matrices. In particular, if the input 
density matrix is p*^", then Eve learns N{p)- Thus, we 
need to run our protocol in an incoherent superposition over 
different values of N{p) and then in a coherent superposition 
within each N~'^{uj). Here, even in the case of a fixed input 
p, the additivity question is open. Until it is resolved, we 
cannot avoid regularized formulas. However, conceptually part 
(e) adds to part (d) only the issues of regularization and 
optimization over ways of splitting iJ" into parts for Ahce 
and Bob. 



D. Relation to other communication protocols 

Special cases of Theorem 3 include remote state prepara- 
tion [8] (and the qubit-using variant, super-dense coding of 
quantum states [27]) for CQ-chaimels M{p) = X^jOlpb^c^; 
the co-bit equality [q qq] = {[q ^ q] + [qq\)/2 [24]; 
measurement compression [52] (building on [42], [43]) for qc- 
channels N{p) = J2j Tr(pM,)|j)(j| where (Mj) is a POVM; 
entanglement dilution [4] for a constant channel J^{p) = ctq; 
and entanglement of purification (EoP) [48] - it was shown by 
Hayashi [29] that optimal visible compression of mixed state 
sources is given by the regularized EoP. 

The Wyner protocol for producing a classical correlated 
distribution [54] is a static analogue of the cbit-rbit tradeoff. 
Similarly, the entanglement of purification is a static version 
of the qubits-but-no-ebits version of the QRST 

For feedback channels, [17] showed that the QRST can 
be combined with the so-called "feedback father" to obtain 
the resource equivalence Eq. (15). On the other hand, [17] 
also showed that running the QRST backwards yields state 
merging, a.k.a. fully-quantum Slepian-Wolf. (The latter has 
been generalized further to state redistribution [?], [45].) 



III. Simulation of classical channels 
A. Overview 

This section is devoted to the proof of Theorem 1 (the 
classical reverse Shannon theorem). Previously parts (a,b) of 
Theorem 1 were proved in [10], [51] and its converse was 
proved in [51]. Here we will review their proof and show 
how it can be extended to cover the low-randomness case 
(parts (c,d,e) of Theorem 1). Similar results have been obtained 
independently in [16]. 

The intuition behind the reverse Shannon theorem can be 
seen by considering a toy version of the problem in which all 
probabilities are uniform. Consider a regular bipartite graph 
with vertices divided into {X,Y) and with edges E c X x 
Y. Since the graph is regular, every vertex in X has degree 
and every vertex in Y has degree |. For x E X, 

let r(a:) C F be its set of neighbors. We can use this to define 
a channel from X to Y: define N{y\x) to be l/|r(a;)| = 
IXj/li?! if y E r(a;) and if not. In other words, maps 
a; to a random one of its neighbors. We call these channels 
"unweighted" since their transition probabiUties correspond to 
an unweighted graph. 

In this case, it is possible to simulate the channel N 
using a message of size « log(|X| • and using w 

log(|i?|/|X|) bits of shared randomness. This can be thought 
of as a special case of part (a) of Theorem 1 in which is 
an unweighted channel and we are only simulating a single 
use of A^. This is achieved by approximately decomposing N 
into a probabilistic mixture of channels and using the shared 
randomness to select which one to use. We will choose these 
channels such that their ranges are disjoint subsets of Y, and 
in fact, will construct them by starting with a partition of Y 
and working backwards. The resulting protocol is analyzed in 
the following Lemma. 

Lemma 5: Consider a channel N : p ^ Y with N{y\x) = 
[b e r(a;)]/|r(a;)|, where [b e r(a;)] is defined to be 1 if 
y G ^{x) and if not. Choose positive integers r, m such 
that rm = \Y\ and let 7 = m|-E|/|X| Choose a random 
partition of Y into subsets Yi,. . . ,Yr, each of size m, and for 
y & Y define i{b) to be the index of the block containing y. 
Define 

[b g r(rr)] 
r-\T{x)nYi^y)\ 



N{y\x) 



to be the channel that results from the following protocol: 

1) Let i e [r] be a uniformly chosen random number shared 
by Alice and Bob. 

2) Given input x, Alice chooses a random element of 
r(a;) (lYi (assuming that one exists) and transmits its 
index j e [m] to Bob. 

3) Bob outputs the j* element of Y^. 

Then it holds with probability > 1 - |i;|e~T^^ that ||A''(-|a;) - 
N{-\x)\\i < e for all x. 

If we choose 7 = 2{ln2\E\) / then there is a nonzero 
probability of a good partition existing. In this case we can 
derandomize the construction and simply say that a partition 
of Y exists such that the above protocol achieves low error on 
all inputs. 
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The idea behind the Lemma is that for each x and i, the 
random variable |r(a;) fl Fil has expectation close to |r(x)| • 
IF^I/IFI = M . 1^ = 7, with typical fluctuations on the 
order of ^7. If 7 is large then these fluctuations are relatively 
small, and the channel simulation is faithful. Similar "covering 
lemmas" appeared in Refs. [51], [16], and were anticipated by 
Ref. [23] and Thm 6.3 of [54]. The details of the proof are 
described in Sec. III-B. 

The difference between Lemma 5 and the classical reverse 
Shannon theorem (i.e. part (a) of Theorem 1) is that we are 
interested in an asymptotically growing number of channel 
uses n and in simulating general channels N, instead of 
unweighted channels. It turns out that when n is large, 
looks mostly like an unweighted channel, in a sense that we 
will make precise in Sec. III-C. We will see that Alice need 
communicate only 0(log(n)) bits to reduce the problem of 
simulating to the problem of simulating an unweighted 
channel. This will complete the proof of the direct part of 
part (a) of Theorem 1 . 

One feature of the protocol in Lemma 5 is that Bob uses 
only shared randomness {i) and the message from Alice (j) 
in order to produce his output y. As a result, the protocol 
effectively simulates the feedback channel Np in which Alice 
also gets a copy of y. Conversely, in order to simulate a 
feedback channel. Bob cannot use local randomness in any 
significant way. 

On the other hand, if Alice does not need to learn y, then 
we can consider protocols in which some of the random bits 
used are shared and some are local to AUce or Bob. This will 
allow us to reduce the use of shared randomness at the cost 
of some extra communication. The resulting trade-off between 
the resources is given in part (c) of Theorem I. In order to 
prove it, we wiU again first consider the unweighted case. 

The idea will be to decompose the channel N{y\x) as 
the composition of channels Ni{w\x) and N2{y\w); i.e. 
N{y\x) = {N2oN,){x) = Y.n,ewNiiMx)N2{y\w). In 
this case Alice can simulate the channel N on input x, by 
simulating A^'i to produce intermediate output w on which Bob 
locally apphes N2 to produce y. Since w is generally more 
correlated with x than y, this will require more communication 
than simply simulating N directly as in Lemma 5. However, 
since Bob simulates using local randonmess, the protocol 
may require less shared randomness, and more importantly, 
the total amount of communication plus shared randomness 
may be lower. 

We will assume that the channels N,Ni and N2 are all 
unweighted channels. Let the corresponding bipartite graphs 
for N, Ni , N2 have edges Exy C XxY, Exw C XxW and 
Eyw CWxY, respectively. We use Txy{x) to denote the 
neighbors of a; in F; that is, Txvix) = {y : {x,y) G Exy}- 
Similarly, we can define Tyxiy) to be the neighbors of y in X, 
Txw {x) to be the neighbors of a; in and so on. We assume 
that the graphs are regular, so that ir^vy (a;) = for 
aU X, \rwY{w) = \Ewy\/\W\ for all w, and so on. Combined 



with the fact that N = N2oNi,we find that 

^iJ = N{y\x) = Y: N,{w\x)N2iy\w) 

^ [w g Txwjx)] [y g FwYiw)] 
^ \Exw\/\X\ ■ \Ewy\/\W\ 

^ \rxw{x)nTYw{y)\ 

\Exw\-\EwY\/\X\\Wy 
Rearranging terms yields the identity 

\Txw{x)nTy^iy)\ = [y G (x)] ^gij^. (34) 

The protocol is now defined in a way similar to the one in 
Lemma 5. 

Lemma 6: Choose positive integers r,m such that m = 
j\X\\W\/\Exw\ and r = \Exy\\W\/\Ewy\\X\. Choose 
disjoint sets Wi, . . . , Wr C W at random, each of size m. Let 
Wi = . . . , Wi,m}- Let N{y\x) be the channel resulting 

from the following protocol: 

1) Let i e [r] be a uniformly chosen random number shared 
by Alice and Bob. 

2) Given input x, Alice chooses a random Wij G T{x)r\Wi 
(assuming that one exists) and transmits its index j e 
[m] to Bob. 

3) Bob outputs y with probability N2{y\wij). 

Then it holds with probabihty > 1 - 2\ExY\e~^'^ ^^'^ that 
\\N{-\x) - N{-\x)\\i <e for all a;. 

We can take 7 = 32{]n2\ExY\)/e'^ and derandomize the 
statement of the Lemma. 

Note that in general we will have rm < \W\, so that this 
protocol does not use all of W. This should not be surprising, 
since faithfully simulating the channel A'^i should in general 
be more expensive than simulating N. The trick is to modify 
the simulation of A^^i that would be implied by Lemma 5 to 
use less randomness, since we can rely on Bob's appUcation 
of N2 to add in randomness at the next stage. 



B. Proof of unweighted classical reverse Shannon theorem 

In the section we prove Lemma 5 and Lemma 6. The 
main tool in both proofs is the Hoeffding bound for the 
hypergeometric distribution [35]. The version we will need 

is 

Lemma 7 (Hoeffding [35]): For integers < a < b < n, 
choose A and B to be random subsets of [n] satisfying | A| = a 
and \B\ = b. Then /x := E[\A n B\] = ab/n and 



Pr[|^nB| > (l + e)M] < 
Pr[|^nB| < (l-e)M] < e""^ 
Pr [| \AnB\-n\> ei4 < 26''^ 



(35) 
(36) 
(37) 
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Now we turn to Lemma 5. We can calculate 

\\N{-\x) - N{-\x)h 



= E 



: J2 \Niy\x) - Niy\x 
yer{x) 

1 1 



E 

1 yer(x)nYi 



r(x)| r-|r(x)ny,(,)| 
1 1 



r- |r(a;) nFil 

r(x)nr,| 



1=1 



\V{x) 



(38) 



To apply Lemma 7, take A = T{x) and B = Yi, so that 
a = \E\/\X\ = r-i, b = m, = \Y\/r, n = \Y\ and /i = 7. T 
each term in the sum in Eq. (38) is < e/r with probability 
> 1 - 2e-T^'/^ Taking the union bound over all a and i 
completes the proof of Lemma 5. 

The proof of Lemma 6 is similar. This time 



N{y\x) 



1 



1 



Pr [Alice sends w|a;,i] iV2(i/|w) 
[w g rxvy(a:)] [y € rvyr(w)] 

|rw(a;)nWi|' |riyy(w)| 



=1 weWi 

\w\ ^\Txw{x)nTYw{y)nWi 



r\EwY\jri \Txw{x)nWi\ 

We will use Lemma 7 twice. First, consider |rxw(^) H Wi\. 
This has expectation equal to 7 and therefore 

Pr [\Txw{x) n W^.l > (1 + e/4)7] < e-^^'/^^ 

(We will see that the one-sided bound simplifies some of the 
later calculations.) Taking the union bound over all \X\r < 
\Exy\ values of x, i, we find that \Txw{x)nWi\ < (l+e/4)7 
for all x,i with probability > 1 — \ExY\e~'*'^ 1^"^. Assuming 
that this is true, we obtain 



N{y\x) > 



r\E^ 



WY 



(1 + e/4)7 



\w\ \Txw{x)nTYw{y)nw\ 



(39) 



r\EwY\ (l + e/4)7 

where we define W = W\ U . . . U W^. Note that is a 
random subset of W of size rm. Using Eq. (34) we find that 
xw{x)r\YYw{v)^'W\\ is equal to 7 when (.x, y) 6 Exy 
and otherwise. Again we use Lemma 7 to bound 



Pr 



\Vxw{x) n VYw{y) n w^l < (1 - 6/4)7] 



for all (x, y) G Exy- Now we take the union bound over all 
pairs (a;, y) e Exy to find that 

\Vxwix) n VYwiy) n W| > (1 - e/4)7 (40) 

with probabiHty > 1 - \ExY\er^''^ ^'^'^ ■ When both Eq. (39) 
and Eq. (40) hold and (x, y) G Exy it follows that 



7V(y|a;) > 



\W\ l-e/4 
r|i;vvy| l + e/4 

\t^XY\ 



>(l-e/2) 



(1 



tl2)N{y\x). 



(41) 



Finally we compare with Eq. (33) to obtain 

\\N{y\x) - N{y\x)\\^ = 2 max(0, 7V(y|a;) - 7V(y|a;)) 

<e^iV(y|a;) = e. (42) 
This concludes the proof of Lemma 6. 

C. Classical types 

In this section we show how the classical methods of 
types can be used to extend Lemmas 5 and 6 to prove the 
coding parts of Theorem 1 . We begin with a summary of the 
arguments aimed at readers already familiar with the method 
of types. The idea is to for Alice to draw a joint type according 
to the appropriate distribution and send this to Bob. This 
requires 0(log(n)) bits of communication and conditioned 
on this joint type they are left with an unweighted channel 
and can apply Lemma 5. It is then a counting exercise to 
show that the communication and randomness costs are as 
claimed. For the low-randomness case, the protocol is based on 
a decomposition N^^^ into N^^^ o N^^^ . Alice draws 
an appropriate joint type for all three variables {X, W, Y) and 
transmits this to Bob. Again this involves 0(log(n)) bits of 
communications and leaves them with an unweighted channel, 
this time of the form that can be simulated with Lemma 6. 

To prove these claims, we begin by reviewing the method of 
types, following [15]. Consider a string a;" = {xi, . . . ,a;„) e 
Af". Define the type of a;" to be the |A'|-tuple of integers 
t(a;") := P-xj, where ej G Z''*' is the unit vector with 

a one in the i* position. Thus t(a;") counts the frequency 
of each symbol x £ X in x". Let TJ^ denote the set of all 
possible types of strings in A'" . Since an element of can 
be written as \X\ numbers ranging from 0, . . . ,n we obtain 
the simple bound [T^l = < (n+ 1)1-^1. For a type 

t, let the normalized probabiUty distribution t := t/n denote 
its empirical distribution. 

For a particular type t e TJ, denote the set of all strings in 
Af" with type i by Tt = {a;" e Af" : = t}. From [15], 
we have 

(n+l)-l'*lexp(nfl'(i)) < \Tt\ = Q < exp(nif(i)), (43) 

where (") is defined to be j-j "' ^ i - Next, let p be a prob- 
ability distribution on X and p^" the probability distribution 
on X" given by n i.i.d. copies of p, i.e. := 
p{xi) ■ ■ ■p{xn). Then for any x" € we have p'^"(a;") = 
YlxexPi^f" = exp(-n(if(f) + D{i\\p))). Combining this 
with Eq. (43), we find that 

'"^(l7^g"^^ <^^"(^^)<exp(-nDm), (44) 

Thus, as n grows large, we are likely to observe an empirical 
distribution t that is close to the actual distribution p. To 
formahze this, define the set of typical sequences T^g by 



u 



(45) 



||t-p||i<5 
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To bound p®^{Tl^g), we apply Pinsker's inequality [47]: 



D{q\\p)>\\\p-q\\l 



to show that 



P^"(T,",)>l-(n +1)1-^1 exp (^-^) 



(46) 



(47) 



We will also need Fannes' inequality [21] which establishes 
the continuity of the entropy function. Let rj{x) = — a; log a; 
and fj{x) ~ 77(min(a;, 1/e)). Then if p,q are probability 
distribution on d letters, 

\H{p)^H{q)\ < \\p-q\\ilog{d) + ri\\p-q\\,) (48) 

If we have a pair of strings a;" S X", y" G 3^", then we can 
define their joint type t{x"y^) simply to be the type of the 
string (xij/i, . . . , XnUn) E {X x yy\ Naturally the bounds in 
Eq. (43) and Eq. (44) apply equally well to joint types, with 
X replaced by A" x 3^. If i is a joint type then we can define its 
marginals e Zl-^l and e Zl^l by = T,yey 

= 'Ylixex ^x,y Let N{y\x) denote a noisy channel from 
X ^ y with Ar"(y"|x") := N{yi\xi) ■ ■ ■ N{y,,\xn). Then 
N^{y"\x") depends only on the type t — t{x"y^) according 
to iV"(y"|a;") = n,^^A^(y|a;)*-«. 

We now have all the tools we need to reduce Theorem 1 
to Lemmas 5 and 6. First, consider parts (a,b) of Theorem 1, 
when we have unlimited shared randomness. In either case, 
the protocol is as follows: 

1) Suppose Alice's input is x". Let s — t{x"). 

2) Alice chooses a random joint type t G Z''^^-^' according 
to the distribution 



Pr[i] = 



N{y\xY--y 







if s = 
otherwise 



3) Alice sends t to Bob using \X x log(n + 1) bits. 

4) Let X = TtA-, Y = Tty and E = Tt (Z X xY define a 
regular bipartite graph. To simulate the action of iV" on 
x" e X, conditioned on G Tt — E, we need 
only to choose a random neighbor of x" in this graph. 
This is achieved with Lemma 5. 

This protocol is depicted in Fig. 6. 

It remains only to analyze the cost of this last step. The 
communication cost is (taking notation from the statement of 
Lemma 5) 



log(m) — log 



log 



i^i|y|7 

\E\ 

\T,.\\T,y\{2H2\T,\)/e') 

mi 



< + H{P) - H{t)) + I A- X 3^1 \og{n + 1) 

+ log(21n(2)niJ(f)/e2) 
= nI{X- y)i + 0(log(n)) + log(l/e2) 

Since C{N) > IiX;y)t for all t, this estabhshes part (b) of 
Theorem I. Continuing, we estimate the randomness cost to 
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Fig. 6. The protocol for the classical reverse Shannon theorem (Theorem 1). 



be 

^)<iogrM 

x\ij- Hi^i 

<n{H{t)~H{F)) + 0{login)) 
= nH{y\X)t + 0{log{n)). 

To prove part (a), we need to relate entropic quantities 
defined for i to the corresponding quantities for p. This 
will be done with typical sets (Eq. (47)) and Fannes' in- 
equaUty (Eq. (48)). If p is a distribution on X then let 
q = Np{p) be the joint distribution on X and Y that 
results from sending X through and obtaining output 
X. Then Eq. (47) impUes that following the above protocol 
results in values of t that are very likely to be close to 
q. In particular, q^'^T^s) > I - {n + l)'^e-"^'/2^ where 
d — I A" X 3^1 . Next, Fannes' inequality says that if i e T^^g 
and 6 < l/e then \H{F) - H{q)\ < Slog{d/6). Applying 
this to each term in I{X;Y) = H{X) + H{Y) - H{XY), 
we obtain that \I{X;Y)t - I{X;Y)q\ < 3Slog{d/S) and 
\H{Y\X)i- I{Y\X)g\ < 2dlog{d/6). Taking d to be n-^/^, 
we obtain a sequence of protocols where both error and 
inefficiency simultaneously vanish as n ^ cx). 

Similarly, for part (c), we need to consider the joint distri- 
bution q of XWy that results from drawing X according to 
p, sending it through Ni to obtain W and then sending W 
through N2 to obtain Y. The protocol is as follows: 

1) Suppose Alice's input is a:". 

2) Alice simulates Ni{x") to obtain and then simulates 
N^{w") to obtain y". 

3) Alice sets t = t{x"w"y^). She will not make any further 
use of w)" or y". 

4) Alice sends t to Bob using jA" x x 3^1 log(n+ 1) bits. 

5) Define X = Tt;t, W = T^w, Y = T^y, Exy = Tt^^v, 
Exw — Tfxw and Ewy — T^wy. To simulate the 
action of N" on G X, conditioned on (a;", w", y") G 
Tt, we need only to choose a random element of 
^xy{x^) in this graph. This is achieved with Lemma 5. 

The analysis of this last step is similar to that of the 
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previous protocol. The communication cost is log(m) = 
\og{\X\\W\-i/\Exw\) = nI{X-W)i + 0{\ogn). and the 
randomness cost is 

' '^\\Ewy\\X\ 
= n{H{XY)i + H{W)i - H{WY)i - H{X)i) + 0(log n) 
using Eq. (43) 

= n{H{XY)-,+H{XW)i-H{XWY)-,-H{W)i)+0{\og n) 
using the Markov condition I{X; Y\W)i = 
= nI{Y;W\X)t + 0{logn) 
= n{I{XY; W)i - I{X; W)i) + 0(log n) (49) 

This concludes the proofs of the existence of channel simula- 
tions claimed in Theorem 1. 

D. Converses 

In this section we discuss why the communication rates for 
the above protocols cannot be improved. The lower bound for 
simulating feedback channels was proven in [51] and for non- 
feedback channels in [16]. We will not repeat the proofs here, 
but only sketch the intuition behind them. 

First, the communication cost must always be at least C{N), 
or I{X;Y)p if the input is restricted to be from the distribution 
p. Otherwise we could combine the simulation with Shannon's 
[forward] noisy channel coding theorem to turn a small 
number of noiseless channel uses into a larger number of uses. 
This is impossible even when shared randomness is allowed. 

Next, if feedback is used then Bob's output (with entropy 
H{Y)) must be entirely determined by the C bits of classical 
communication sent and the R bits of shared randomness used. 
Therefore we must have C + R> H{Y). 

The situation is more delicate when the simulation does not 
need to provide feedback to Alice. Suppose we have a protocol 
that uses C cbits and R rbits. Then letW={Wi,W2) com- 
prise both the message sent (Wi) and the shared random string 
(W2). We immediately obtain I{XY; W) < H{W) <C + R. 
Additionally, the shared randomness W2 and the message 
X are independent even given the message Wi, in other 
words I{X;W2\Wi) = 0. Thus IiX;W) = IiX;Wi) < 
H{Wi) < C. Finally we observe that X -W -Y satisfies 
the Markov chain condition since Bob produces Y only by 
observing W. This argument is discussed in more detail in 
[16], where it is also proven that it suffices to consider single- 
letter maximizations. 

We observe that some of these converses are obtained from 
coding theorems and others are obtained from more traditional 
entropic bounds. In the cases where the converses are obtained 
from coding theorems then we in fact generally obtain strong 
converses, meaning that fidelity decreases exponentially when 
we try to use less communication or randomness than neces- 
sary. This is discussed in [51] and we wiU discuss a quantimi 
analogue of this point in Sec. IV-E. 

IV. Simulation of quantum channels on arbitrary 

INPUTS 

This section is devoted to proving parts (d) and (e) of 
Theorem 3. 



A. The case of flat spectra 

By analogy with Sec. III-B, we will first state an unweighted 
or "flat" version of the quantum reverse Shannon theorem. 
We will then use a quantum version of type theory (based 
on Schur-Weyl duality) to extend this to prove the QRST for 
general inputs. 

Definition 8: An isometry called flat if, when 

applied to half of a maximally entangled state |<I>)^"^, it 
produces a state \tp)^^^ with ip^, tp^ and each maximally 
mixed. 

(The requirement that be maximally mixed is satisfied au- 
tomatically, but we include it to emphasize that each marginal 
of ip should be maximally mixed.) An important special case 
of flat channels occurs when A, B, E are irreps of some group 
G and is a G-invariant map. We will return to this point in 
Sec. IV-C2. 

Lemma 9 ([37], [1]): Let R,B,E,K,M have dimensions 
Dr,Db, De, Dk, Dm respectively. Let Db = DkDm and 
assume that Dm > "/\/DrDb/De. If v"^^^ is a flat 
isometry, it can be simulated up to error 47^/^ by consuming 
log{DK) ebits and sending log{DM) qubits from Alice to 
Bob. 

In an earlier unpublished version of this work, we proved 
a version of Lemma 9 using the measurement compression 
theorem of [52]. This version used classical instead of quantum 
communication (with correspondingly different rates), but by 
making the communication coherent in the sense of [24], [18] 
it is possible to recover Lemma 9. 

However, a conceptually simpler proof of Lemma 9 was 
later given by [17], [37], [1]. This proof is based on reversing 
"state merging," which we can think of as the task of Bob 
sending a subsystem B to Alice in a way that preserves its 
correlation with a subsystem E which AUce already has, as 
well as with a purifying reference system R. In other words, 
merging is a state redistribution of the form 

The simplest proof of state merging is given in [1], where 
it is shown that if Bob splits B randomly into systems Bi 
and B2 of the appropriate sizes, and sends Bi to Alice, 
then Alice wiU be able to locally transform E,Bi into two 
subsystems A and A2 such that A is completely entangled with 
the reference system R (and thus can be locally transformed 
by Alice into E,B, the desired goal of the merging.). On 
the other hand A2 is nearly completely entangled with the 
remaining B2 system that Bob kept, so that it represents a 
byproduct of entanglement between Alice and Bob that has 
been generated by the protocol. When executed in reverse, 
the merging becomes splitting, and the A2B2 entanglement 
becomes a resource that is consumed, along with the quantum 
transmission of system Bi from Alice to Bob, in order to 
implement the state- splitting redistribution 

,R:A , i^R-.EB , -cfiR-.E-.B 



(51) 



B. Tensor power inputs 

We next need to reduce the general channel simulation 
problem to the problem of simulating flat channels. To get the 
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idea of how this works, consider first the problem of simulating 
A/"®" on a tensor power input p®". While solutions to this 
problem have been previously described in [37], [17], [1], we 
will present a protocol in a way that will help us understand 
the general case. 

Let |ct)^^^ = (/® A/')|$p)^^' and IV^) = |cr)®". Unfor- 
tunately, none of ijj"^, il;^ nor ip^ are in generally maximally 
mixed. Even restricting to typical subspaces still leaves these 
states with eigenvalues that vary over a range of 2='='^(^) . 

On the other hand, these eigenvalues have a large amount 
of degeneracy. Let { | ai ),..., |ad^)} be the eigenbasis of 
p = a"^. Then the eigenvectors of tp^ can be taken to be 
of the form \ai^) • • • (g) |ai„), for i = (zi, . . . ,i„) G [rf^]"- 
Moreover the corresponding eigenvalue is determined entirely 
by the type Ia of z, just as in the classical case. There are 
^^^^ types. For fixed d^, this number is polynomial 
in n, and thus the "which type" information can be transmitted 
using O(logn) qubits. Conditioned on this information, we 
are left with a flat spectrum over a space whose dimension 
depends on the type. 

The same decomposition into types can be performed for 
the B and E systems, and for constant ds and (Ie 'ws wiU 
still have at most poly(n) types ts and ts- Furthermore, we 
can decompose the action of A/"®" into a map from to a 
superposition of ts and ts followed by a flat map within the 
type classes. 

The only remaining question is to determine the commu- 
nication rate. Here we can use the classical theory of types 
from Sec. Ill-C to argue that almost all of the weight of t/j 
is concentrated in strings with tA,tB, ts close to the spectra 
of a^, and respectively. If "close" is defined to be 
distance S, then ignoring the atypical types incurs error at most 
exp(— 0(n(5)), and we are left with subspaces of dimensions 
Da = exp(n(5(^)a ± S)), Db = cxp{n{S{B)^ ± 6)) and 
De = exp{n{S{E)a- ±<5)). Applying Lemma 9 we obtain the 
claimed communication rates of ■^(^)+'S'(^)^-S'(-E) _ B) 
qubits and s(b)+s(e)-s{A) ^ i ^^^^^ ^ 

One subtlety arises from combining communication pro- 
tocols involving different input and output types. Unlike in 
the classical channel simulation protocol, AUce would Uke 
to communicate to Bob only ts and not tA or Ie- Indeed, 
she would like to forget Ia and retain only knowledge of ts 
for herself. On the other hand, the flat- spectrum simulation 
protocol of Lemma 9 requires knowledge of tA, ts and Ie 
for the encoder, although only knowledge of ts is needed 
for the decoder. These requirements are met by the following 
protocol. 

1) Alice simulates the map from Ia {tB,tE) in a way 
that leaves her tA register intact. 

2) She performs the encoding of Lemma 9 on her remain- 
ing log(£'^) qubits, keeping log{DE) qubits for herself 
as output and sending log(DM) qubits to Bob. 

3) Using her tB,tE registers, she erases her record of tA- 

4) Alice sends to Bob. 

5) Bob uses is and the R and M registers to perform the 
decoding procedure of Lemma 9. 

One final difficulty is that the sizes of the R and M registers 



appear to depend on tA,tB,tE- While these registers vary 
in size by only 0{n5) qubits, there is still the risk that 
even information about their size may leak information to 
Bob about tA and tE that he would have to somehow send 
back using additional rounds of communication. For the M 
register we can address this by simply taking Dm to equal 
max[7|Tt^ ||Ti^ l/lTt^ |], where the maximum is taken over 
all typical triples of tA,tB,tE- Thus, Dm is independent of 
any of the registers communicated during the protocol. 

However, since Db,Dm must equal \Tfj^\, we cannot avoid 
having D^ vary with ts- (There is a minor technical point 
related to Db needing to be an integer, but this can be ignored 
at the cost of an exponentially small error.) As a result, 
we need to run the protocol of Lenmia 9 in superposition 
using different numbers of EPR pairs in different branches 
of the superposition. This cannot be accomphshed simply by 
discarding the urmecessary EPR pairs in the branches of the 
superposition that need less entanglement; instead we need 
to use one of the techniques from Sec. Il-C. Fortunately, 
since the number of EPR pairs varies by only 0{n6) across 
different values of tB, we only need to generate 0{nS) 
bits of entanglement spread. This can be done with 0{n6) 
extra qubits of communication, leading to an asymptotically 
vanishing increase in the communication rate. 

Earlier versions of the quantum reverse Shannon theorem 
did not need to mention this sublinear amount of entanglement 
spread because the extra sublinear communication cost could 
be handled automatically by the protocols used. However, 
when we consider non-tensor power inputs in Sec. IV-D we 
will need to make a more explicit accounting of the costs of 
entanglement spread. 

C. A quantum theory of types 

There is one further difficulty which arises when considering 
non-tensor power inputs. This problem can already been seen 
in the case when the input to the channel is of the form 
|(pf " + pf ")• If Pi and p2 do not commute, then we cannot 
run the protocol of the previous section without first estimating 
the eigenbases of pi and p2- Moreover, we need to perform 
this estimation in a non-destructive way and then be able to 
uncompute our estimates of the eigenbasis, as well as any 
intermediate calculations used. Such techniques have been 
used to perform quantum data compression of p®" when p 
is unknown[38], [7]. However, even for that much simpler 
problem they require delicate analysis. We believe that it 
is possible to prove the quantum reverse Shannon theorem 
by carefully using state estimation, but instead will present 
a somewhat simpler proof that makes use of representation 
theory. 

The area of representation theory we will use is known as 
Schur duality (or Schur-Weyl duality). It has also been used 
for data compression of unknown tensor power states [31], 
[32], [28] and entanglement concentration from tensor powers 
of unknown pure entangled states [30], [33]. Some reviews 
of the role of Schur duality in quantum information can be 
found in Chapters 5 and 6 of [25] and Chapters 1 and 2 
of [12]. A detailed explanation of the mathematics behind 
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Schur duality can also be found in [22]. Our treatment will 
follow [25]. In Sec. IV-Cl, we will explain how Schur duality 
can serve as a quantum analogue of the classical method of 
types that we described in Sec. III-C. Then in Sec. IV-C2 
we will show this can be applied to channels, allowing us to 
decompose A/"®" into a superposition of flat channels. Finally, 
in Sec. IV-C3 we will use this to describe quantum analogues 
of conditional types. We wiU use this to show that the atypical 
flat sub-channels involve only an exponentially small amount 
of amplitude. 

In Sec. IV-D, we wiU use these tools to prove Theorem 3. 

1) Schur duality and quantum states: This section will 
review the basics of Schur duality and explain how it can 
serve as a quantum analogue of the classical method of types. 
Let Sn denote the permutation groups on n objects and let Ud 
denote the d-dimensional unitary group. Both groups have a 
natural action of (C*)®". For ueUd define Q(u) = u®" and 
for s G Sn define P(s) to permute the n systems according 
to s: namely. 



P(s) = 



s{n)\ 



^eld] 



These two representations commute, and can be simulta- 
neously decomposed into irreducible representations (a.k.a. 
irreps). (We can also think of Q(u)P(s) as a reducible 
representation of Ud x 5„.) 

Define J^.n to be the set of partitions of n into d parts: that 
is Td,n = {X = (Ai, A2, . . . , Ad) e Z<*|Ai > A2 > • • • > Ad > 
and X:ti ^^ = Note that |Jd,„| < |Td,„| = {n+iy = 
poly(n). It turns out that Id,„ labels the irreps of both Ud 
and Sn that appear in the decompositions of Q and P. Define 
these representation spaces to be Qx and Vx ™d define the 
corresponding representation matrices to be q_x{u) and Pa(s)- 
Sometime we write Qf or qf to emphasize the d-dependence; 
no such label is needed for Vx since A already determines n. 

Schur duahty states that (C'')'^" decomposes under the 
simultaneous actions of Q and P as 



e 



'Vx 



(52) 



This means that we can decompose (C'')'^" into three regis- 
ters: an irrep label A which determines the actions of Ud and 
Sn, a Ud-iriep Qx and an 5„-irrep Vx- Since the dimension 
of Qx and Vx depends on A, the registers are not in a strict 
tensor product. However, by padding the Qx and Vx registers 
we can treat the A, Qx and Vx registers as being in a tensor 
product. 

The isomorphism in Eq. (52) implies the existence of a 
unitary transform Usch that maps (C'')'^" to 0Aei^ „ 2a *^ 
Vx in a way that commutes with the action of Ud and 5„. 
SpecificaUy we have that for any u £Ud and any s & Sn, 

UscMUMsWL = E ® ^A(t^) ® Pa(s). (53) 

While we have described Schur duality in terms of the 
representation theory of <S„ and the Lie group Ud, there exists 
a similar relation between <S„ and the Lie algebra fll^- Letting 



qx denote gl^^ -irreps as weU, one can show an analogue of 
Eq. (53) for tensor power states: 

t^SchP®"t/s^eh = E |A)(A|®q^(p)®/p,. (54) 

So far we have not had to describe in detail the structure of 
the irreps of Ud and 5„. In fact, we will mostly not need 
to do this in order to develop quantum analogues of the 
classical results from Sec. III-C. Here, the correct analogue 
of a classical type is in fact A together with Qx- Classically, 
we might imagine dividing a type {ti,. . . , td) into a sorted hst 

> • • • > (analogous to A) and the Sd permutation that 
maps into t (analogous to the Qx register). Quantumly, we 
will see that for states of the form p®", the A register carries 
information about the eigenvalues of p and the Qx register is 
determined by the eigenbasis of p. 

The main thing we will need to know about Qx and Vx 
is their dimension. Roughly speaking, if d is constant then 
\Id,n\ < C^t') ^ P0ly{n), dimQx < poly(n) and 
dimVx ~ exp{nH{X)). For completeness, we also state 
exact formulas for the dimensions of Qx and Vx, although 
we will not need to use them. For A G define A := 

A + (d — 1, d — 2, . . . , 1, 0). Then the dimensions of Qf and 
Vx are given by [22] 



dim Qi = 



dim Vx = 



ni<i<,<d(Ai - Aj) 



n 



(55) 



m=l 



ml 



j-^i^ n (A.-A,) (56, 

It is straightforward to bound these by [28], [13] 

dim Qi<{n + d)'^('i-i)/2 (57) 

(n + rf)-<^(<^-i)/2 < dimVx < l^fj . (58) 

Applying Eq. (43) to Eq. (58) yields the more useful 

To relate this to quantum states, let Hx denote the projector 
onto Qf(g>VxC (C)®". Explicitly Ux is given by 



(60) 



From the bounds on dim Qf and dim Vx in Eqs. (57,59), we 
obtain 

(61) 

As in the classical case, i.i.d. states have a sharply peaked dis- 
tribution of A values. Let r = (ri, . . . , rd) be the eigenvalues 



of a state p and, for ^ e Z"^, define 



■ r^'' . Then, 



one can Section 6.2 of [25] proved that one can upper bound 
TV Hap® "Ha = Trq^p) ■ dim Pa by 

exp {-nD{\\\r)) {n + d)-'^^'^+^y^ 

< TrnAP®"nA 

< exp {-nD(X\\r)) (n + df^'^-^^'^ (62) 
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Similarly, we have Hap®" = p®"nA = Uxp^'^Ux and 

nAP®"nA < r^Ux = exp[-n(if(A) + D(A||r))]nA. (63) 

For some values of jj,, r'^ can be much smaller, so we 
cannot express any useful lower bound on the eigenvalues of 
Il\p^"Il\, like we can with classical types. Of course, tracing 
out Qf gives us a maximally mixed state in Vx, and this is 
the quantum analogue of the fact that P®"(-|t) is uniformly 
distributed over Tf. 
We can also define the typical projector 



n: 



r,6 



E nA 



A:||A-r||i<r 



J2 |A)(A|® Jq. 

|_A:llA-rlli<r 

Using Pinsker's inequality, we find that 

Trn>«">l-exp(-"^ 



C/sch. (64) 



{n + d) 



d{d+l)/2 



(65) 



similar to the classical case. The typical subspace is defined 
to be the support of the typical projector. Its dimension can 
be bounded (using Eqs. (65,48)) by 

TrU'^s<\Id,n\ max TrH^ 

A:||A-r||i<r 

< {n+df'^'^+^^/'^exp{nH{r) + r]{d)+6logd). (66) 

2) Decomposition of memoryless quantum channels: The 
point of introducing the Schur formalism is to decompose 
A/^®" (or more accurately, its isometric extension f/^") into 
a superposition of flat sub-channels. This is accomplished 
by splitting and each into A, Qa and Vx sub- 
systems labelled A^, Qaa'^Aaj Then the map from 

Vxj^ — > Vxb ® 'Pxe commutes with the action of <S„ and as 
a result has the desired property of being flat. 

To prove this more rigorously, a general isometry from 
A" B^E'^ can be written as a sum of terms of 

the form |Ab,Ab)(Aa| ® \qB,qE)\qA) ® P\a,\b,\e^ where 
Iqa), \<1b), \qe) are basis states for the respective Qx registers 
and Pxa,Xb.\e is a map from Vxa ^ 'Pxb ® Vxe- 

Since U^"' commutes with the action of iS„, it follows 
that each -Paa,Ab,Ae must also commute with the action of 
Sn- Specificafly, for any \a,^b,^e G Id,n (with d = 
ma,x{dA,dB,dE)) and any s e 5„, we have 

(PAb(s) (8)PAe(s))Paa,Ab,AePAa(s) = P\a,\b,\e- 

Thus paa(s)^p1^.Ab.Ae^Aa.Ab.AePa,i(s) 



^\a,Xb,>^e'^^aAb,Xe 

Lemma 



Pxa.x^.x,^ for all s £ Sn, and by Schur's 
\a,\b,\e^^a,\b,\e is proportional to the identity 



on Pa.i- Therefore Pxa,\b,>.e is proportional to an isometry. 
Furthermore, Pxa,\b,Xe maps the maximally mixed state on 



Vxa to a state proportional to Paa.Ab.Ab^a 



This 



state commutes with Pa£j(s) ® pab(s) for all s € 5„, and 
so, if we again use Schur's Lemma, we find that the reduced 
states on Vxb and Vxe are both maximally mixed. Therefore 
P\a,\b,\b is proportional to a flat isometry. 



To make this notation more concise, we let 
Hom(7'AA J ^Ab ® P\e)^" denote the set of maps from 
Vxa to Vxb ® P\e that commute with 5„. As we have 
argued, Pxa,\b,>^b always belongs to this set. Let Px^'^'^ be 
an orthonormal basis for Hom(7'AAi^AB ® ^Ab)"^"- Let S 
denote flie natural map from Hom('PAA > "^Ab '^''Pae )'^" (^Pxa 
to Pab ®P\e- We also let Ta denote the set of pairs (Aa, qA), 
where \qA) runs over some fixed orthonormal basis of Qaa' 
and similarly we define Tb and Te- This allows us to 
represent f/^" as 



"= E E CtA,tB,tE,^tB,tE){tA\®P^ (67) 



tEeTE 



= J2 J2 CtA,tB,tE,n\tB,tE){tA\(^S\n) (68) 

Here we use P^ and S\p) interchangeably to denote the 
appropriate map in Hom(PA^,PAB 'X'Pae)'^"- 

Remark: We will not need to know anything more about 
the representation-theoretic structure of Paa.Ab.Ab' t»ut the 
interested reader can find a more detailed description of this 
decomposition of J7^" in Section 6.4 of [25]. 

We now have a situation largely parallel to the classical 
theory of joint types: all but poly(n) dimensions are de- 



scribed by the flat isometrics Pa^ 



,Ab,A£ 



Next we need to 



describe an analogue of joint typicality, so that we can restrict 
our attention to triples of {Xa,)^b,^e) that contribute non- 
neghgible amounts of amplitude to J7^". In the next section, 
we will argue that ctA.tB.tE-i^ is exponentially small unless 
(Aa, Ab, A^;) correspond to the possible spectra of marginals 
of some state tp^^^ that is obtained by applying Uj^ to a pure 
state on RA. 

Probably two diagrams would be suitable here: one showing 
how the Schur basis gets mapped in general, and one that 
specaUzes to <S„ -invariant channels, like U^". 

3) Jointly typical projectors in the Schur basis: In order for 
Eq. (68) to be useful, we need to control the possible triples 
{tA,tB,tE) that can have non-negUgible weight in the sum. In 
fact, it will suffice to bound which triples (A^, As, A^) appear, 
since these determine the dimensions of the Pa registers and 
in turn determine the dominant part of the communication 
cost. For large values of n, almost all of the weight will be 
contained in a small set of typical triples of {Xa, ^b, ^e)- 
These triples are the quantum analogue of joint types from 
classical information theory. 

Let be an arbitrary channel input, and \tp)^^^ = 
(7^ Uj^'^^^)\^p)^^' the purified channel output. Now 
define R{M) to be set of ^p^^^ that can be generated in this 
manner. Further define to be {(rA,rB,rB) : Bip"^^^ € 
R{Af) s.trA = spec{ip'^),rB = spec{tp^),rE = spec{ijj^)}. 
This set is simply the set of triples of spectra that can arise 
from one use of the channel. We will argue that it corresponds 
as weU to the set of (A a, As, A^;) onto which a channel's input 
and output can be projected with little disturbance. Let TJ^ g 
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denote the set 

{(Aa, As, Xe) ■■ 3(rA, rs, r^) e T^, 

\\Xa - taWi + \\Xb - rslli + ||Ab - teWi < 6} (69) 
Next, we define the jointly-typical projector 

(AA,AB,AE)eTj}.^j 

Then the following lemma was proven in Section 6.4.3 of [25]. 
Lemma 10 ([25]): Let d = iaia.x{dA, ds^dE)- For any state 
^jtj^ ^ (7(g) ?7a/-)®"I<p). 

(*|n^_5|*) > 1 - rf^'^^^ exp(-n^2-)_ 

Z). Reduction to the flat spectrum case 

In this section we prove the coding theorem for the QRST. 
The outUne of the proof is as follows: 

• We show that general inputs can be replaced by 
invariant inputs with asymptotically no increase in com- 
munication rate. This simply uses randomness recycUng. 

• We show that 5„-invariant inputs decompose into a super- 
position of flat sub-channels. This is based on Sec. IV-C2. 

• We show that atypical sub-channels can be ignored with 
negligible error (using Sec. 1V-C3). 

• We paste together simulations of different flat channels 
using entanglement spread (introduced in Sec. II-C). 

We now explain these components in more detail. First, we 
show how it is possible to assume without loss of general- 
ity that our inputs are Sn symmetric. Choose m such that 
\og(m\) « 5n; i.e. m ^ 5n/\og{n). Then we will break the 
input into n/m blocks, each of size m. Using log(m!) bits of 
shared randomness (which can be simulated with cbits, ebits 
or qubits), Alice and Bob can permute the inputs and outputs 
of each block. 

In order to make the randomness cost sublinear in n, we 
need to reuse the same log(m!) random bits for each block. 
Apart from the small errors induced by our protocol, the 
randomness will remain independent of the output, and thus 
can be reused from block to block. This was stated formally in 
Lemma 4.7 of [18], where we need to use the fact that (in the 
language of [18]) the random bits are incoherently decoupled 
from the rest of protocol. 

One effect of this randomness recychng is that errors in 
separate blocks may become correlated. However, the total 
error amplitude is still at most the sum of the error am- 
plitudes in each block. Thus, the total error will scale as 
(n/m)e~'"^ poly(m) ~ exp(— n(5/log(n)). 

For the rest of this section, we simply assume that Alice 
is given half of an iS„-invariant input \ip)^ ^ . Based on 
Sec. IV-C2, we can decompose the action of J7^" into a map 
from Ia to tB-tE,l^- followed by a map from pA and /i to 
Pb,Pe- The ts register has only poly(n) dimension, and can 
be transmitted uncompressed to Bob using O(logn) qubits. 
On the other hand, the map is flat, and therefore can be 
compressed using Lemma 9. 

To understand the costs of compressing P^, we need to 
estimate the dimensions of the "Pa^i , "Pab > ^Ae registers. In 



Sec. IV-Cl, we showed that dim Pa ~ exp(niJ(A)) up to 
poly(n) factors. So the cost of simulating a flat map from 
Vx^ to ®Vxj, is \nH{\A + As - A^) + 0(log n) qubits 
and ^nH{XB + Xe — Xa) + O(logn) ebits. 

Next, we can relate these costs to entropic quantities. Using 
Lemma 10 from Sec. IV-C3, it follows that we need only 
consider the triples {Xa,Xb,Xe) within distance 0{S) of a 
spectral triple (r^ , rs , r^; ) corresponding to a possible channel 
output. Therefore, the problem of simulationg A/"®" can be 
reduced to producing a superposition of 

B)p + 0{nS + \ogn)j [q ^ q] 

+ (^\nI{E; B)p + 0(n<51ogn)) [qq] (70) 

for all possible single-letter inputs p. If we take 5 — > as 
n — > oo then this corresponds to an asymptotic rate of 

]^I{R;B)p[q^q]+'^-nI{E-B)p[qq] (71) 

per channel use. 

FinaUy, producing Eq. (70) in superposition may require 

clean 

entanglement spread. Suppose that a > (3 and /3 > 
\I{R] B)p[q ^ q] + \nI{E] B)p[qq] for all p. Then we can 
prepare (3 from a and then use (3 to produce the resources 
needed to simulate {Np : p) in superposition across aU p (or 
equivalently across all Ia in the input). 

When we do not need to simulate feedback, the main dif- 
ference is that we can split the E register into a part for Alice 
{Ea) and a part for Bob {Eb). Additionally, this sphtting is not 
restricted to be i.i.d., although the corresponding "additivity" 
question here remains open. That is, for any n > \ and any 
V : E"^ EaEb, simulating the action of A/"*^" can be 
achieved by simulating V oAf^"'. Here AUce gets the output 
Ea and Bob gets the output B"Eb- Moreover, we are in 
some cases able to break the superpositions between different 
tA- If feedback is not required, then we can assume without 
loss of generality that Alice has measured tp, estimated N{p) 
to within 0(n~^/^) accuracy [39] and communicated the 
resulting estimate to Bob using o(n) conmiunication. 

However, in some cases (including an example we will 
describe in the next section), Af{p) does not uniquely deter- 
mine M{p), and thereby determine the rate of entanglement 
needed. In this case, it will suffice to prepare a superposition 
of entanglement corresponding to any source in (AA®")~^(a;) 
for each oj G range(7v®"). This yields the communication 
cost claimed in Theorem 3. 

E. Converses and strong converses 

As with the classical reverse Shannon theorem, the existence 
of a coding theorem (this time for entanglement-assisted ca- 
pacity [10]) means that no better simulation is possible, at least 
not for i.i.d. inputs. In fact, such matching coding theorems 
generally give us strong converses, implying that attempting to 
simulate a channel at a rate 5 lower than necessary results in 
an error rate > 1 - poly(n) • 2""'''/ . (The log(n) factor is 
due to our use of recycled shared randomness to symmetrize 
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QRST in Schur Basis, for known tensor power source 
Purification of a tensor power state p®" input to n instances of quantum channel A/ 



Back communication-assisted QRST in Schur Basis, for general source and channel 
Purification of a general state p input to /; instances of quantum channel TV 
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Fig. 7. Quantum protocol for quantum reverse Shannon theorem on a known 
tensor power source. Ahce transforms the tensor power input into the Schur 
representation, comprising a small A register, a small q register, and a large p 
register containing the irrep. These registers, together with a slight (0(\/n)) 
excess of halves of ebits shared with Bob, are coherently transformed into 
about ^nI(R\ B) qubits worth of flat sub-channel codes representing Bob's 
p register, which Bob decodes with the help of the other halves of the shared 
ebits and the small and qg registers sent from Alice. Alice also returns the 
(0(y/n)) unused halves of ebits, allowing them to be coherently destroyed. 
The remaining registers \e, Qe, and pe, representing Eve's share of the 
output, remain with Alice, as required for a quantum feedback simulation of 
the channel A/"" . By discarding them into the environment, one obtains a (not 
necessarily efficient) non-feedback simulation. 



Embezzlement-assisted QRST in Sctiur Basis, for general source and channel 
Purification of a general state p input to n instances of quantum channel .A/ 




Alice's Lab 
Random 
permutation 
to symmetrize 
input. 




Bob's Lab 



Fig. 8. QRST on a general input using an entanglement-embezzling state 
(green). Alice first randomly permutes the inputs into n instances of her 
quantum channel, using random information shared with Bob (magenta), 
rendering the overall input permutation-symmetric. She then uses the Ag 
register to embezzle the correct superposition of (possibly very) different 
amounts of entanglement needed her sub-channel encoder, leaving a negligibly 
degraded embezzling state behind. At the receiving end (lower light) Bob 
performs his half of the embezzlement, coherently decodes the sub-channel 
codes, and undoes the random permutation. The shared randomness needed 
for the initial random permutation can also be obtained from the embezzling 
state. 




Fig. 9. QRST using classical back communication. Here the requisite spread 
is generated by starting with a large amount of ordinary entanglement, then 
using back communication and the \b register to coherently bum off some 
of it. This requires the A^ register to make a round trip from Alice to Bob 
then back again, before finally returning to Bob, who needs to be holding it 
at the end. Other aspects of the protocol are as in the embezzlement-assisted 
implementation of the preceding figure. 



the inputs. We believe that it could be removed ■with a tighter 
analysis.) 

In this case, ■we are able to establish such a strong 
converse only in the case of a feedback channel with a 
known i.i.d. input. Here it is not only known [18] that 
(AAf : p) > \l{R\B)[q q] + \l{B-E)[qq], but the 
corresponding protocol can be shown to have error bounded 
by 2^"* . Thus, suppose a simulation existed for {M : p) 
that used \{I{R\B) — 5)[q q] and an unlimited amount 
of entanglement and back communication to achieve fidelity 
/. Then combining this simulation with teleportation and our 
coding protocol would give a method for using ebits at rate 
I{R; B) — 5 together with entanglement to simulate ebits at 
rate I{R; B)-S/2 with fidehty > / - 2-"*" for some S' > 0. 
By causality, any such simulation must have fidelity < 2^"*/^, 
and thus we must have / < 2-''^^^ + 2""'''. 

Similar arguments can be used to show a strong converse for 
the entanglement-assisted coding theorem as well. Previously 
this was known to hold only when considering product- 
state inputs [44], [50] or restricted classes of channels[41]. In 
fact, this strong converse also applies in the setting where 
free communication is allowed from Bob to Alice. This is 
somewhat surprising, given that back communication is known 
to increase the classical capacity in the unassisted case [6]. 

When we consider simulations without feedback, we no 
longer have additivity or strong converses. Here we are 
able only to establish regularized coding theorems. The 
zero-entanglement limit is discussed in [29] and the low- 
entanglement regime (part (b) of Theorem 3) follows similar 
lines. The main idea is that if only coherent resources (such 
as qubits and ebits) are used, then the state of the environment 
is entirely comprised of what Alice and Bob discard. Let 
Ea (resp. Eb) denote the system that Alice (resp. Bob) 
discards. Then if the simulation has 1 — e fidelity with the 
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true channel, Uhlmann's theorem imphes that there exists an 
isometry from EaEb to such that R^B'^E"' has 1 - e 
fidelity with the output of C/^". Now we obtain Eqs. (17,18) 
from Fannes's inequality. Before discarding Eb, Bob's total 
state B^Eb is within e of a state on g + e qubits, and thus 
has H{B"Eb) < q + e + 0(ne). Similarly, Bob has received 
only q qubits, so we must have \l{R"'; B^Eb) <q + 0{ne). 

Finally, for general inputs we need to argue that entan- 
glement spread is necessary. In the feedback case, we will 
consider an input for Alice of the form 

JSin I igm I (gin 

Pi "T P2 "T Pa 
3 

where pi maximizes I{R; B) /2, p2 maximizes H{B) 
and P3 maximizes —H{B\R). (This is shorthand for 
saying that Alice gets one half of some purifica- 



tion, such as (|1)''*(|<I>^ 



\RA\ 



|2)«(|1>p 



+ 



|3)^(|$P3)^^)®")/y3.) In the non-feedback case, we sim- 
ilarly consider the input (p" + P2 + Ps)/^. where Pi,P2, P3 
are the respective maximizing states within {J\f^")~^{uj) for 
a fixed u! G range (A^*^"). We will present our arguments for 
the feedback case, but the non-feedback case can be handled 
similarly. 

An accurate channel simulation should work simultaneously 
for pi,p2,P3 while not breaking the superposition between 
these components. As a result, if we perform the simulation 
using the resources in Eq. (29) then we can obtain the required 
constraints on Qi,Q2,C,E. Since Qi is the only source of 
forward communication, we always have Qi > Q{Af) = 
^nI{R; B) p^. Next, our protocol needs to work coherently 
on states p2 and ps. The resulting constraints are the same as 
those derived at the end of Sec. II-C, where instead of Fannes' 
inequality we will use Theorem 4. 
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